AITopics | visual content

Collaborating Authors

visual content

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0060ef47b12160b9198302ebdb144dcf-AuthorFeedback.pdf

Neural Information Processing SystemsApr-30-2026, 19:38:23 GMT

artificial intelligence, contribution, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models Junho Kim Hyun Jun Kim Y eon Ju Kim Yong Man Ro

Neural Information Processing SystemsFeb-18-2026, 15:57:19 GMT

Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Hawaii (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(4 more...)

Add feedback

T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition Chen Y eh 1 You-Ming Chang 1 Wei-Chen Chiu 1 Ning Y u

Neural Information Processing SystemsFeb-18-2026, 05:00:45 GMT

Warning: This paper contains inappropriate/harmful visual contents. While widespread access to the Internet and the rapid advancement of generative models boost people's creativity and productivity, the risk of encountering inappropriate or harmful content also increases.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

6dcf277ea32ce3288914faf369fe6de0-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 20:04:11 GMT

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.46)

Industry:

Leisure & Entertainment > Sports > Skiing (1.00)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

2f8ee6a3d766b426d2618e555b5aeb39-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 15:01:09 GMT

arxiv preprint arxiv, benchmark, vlm, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Nebraska (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Paraphrasing Is All You Need for Novel Object Captioning Cheng-Fu Y ang 1 Y ao-Hung Hubert Tsai 2 Wan-Cyuan Fan

Neural Information Processing SystemsFeb-8-2026, 01:36:47 GMT

Novel object captioning (NOC) aims to describe images containing objects without observing their ground truth captions during training.

caption, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: Asia > Taiwan (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

0060ef47b12160b9198302ebdb144dcf-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 06:54:19 GMT

benchmark, contribution, mil-nce, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries

Bhattacharyya, Sree, Singla, Yaman Kumar, Yarram, Sudhir, Singh, Somesh Kumar, S, Harini I, Wang, James Z.

arXiv.org Artificial IntelligenceNov-27-2025

Visual content memorability has intrigued the scientific community for decades, with applications ranging widely, from understanding nuanced aspects of human memory to enhancing content design. A significant challenge in progressing the field lies in the expensive process of collecting memorability annotations from humans. This limits the diversity and scalability of datasets for modeling visual content memorability. Most existing datasets are limited to collecting aggregate memorability scores for visual content, not capturing the nuanced memorability signals present in natural, open-ended recall descriptions. In this work, we introduce the first large-scale unsupervised dataset designed explicitly for modeling visual memorability signals, containing over 82,000 videos, accompanied by descriptive recall data. We leverage tip-of-the-tongue (ToT) retrieval queries from online platforms such as Reddit. We demonstrate that our unsupervised dataset provides rich signals for two memorability-related tasks: recall generation and ToT retrieval. Large vision-language models fine-tuned on our dataset outperform state-of-the-art models such as GPT-4o in generating open-ended memorability descriptions for visual content. We also employ a contrastive training strategy to create the first model capable of performing multimodal ToT retrieval. Our dataset and models present a novel direction, facilitating progress in visual content memorability research.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.20854

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Media (1.00)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems

Lumer, Elias, Cardenas, Alex, Melich, Matt, Mason, Myles, Dieter, Sara, Subbiah, Vamse Kumar, Basavaraju, Pradeep Honaganahalli, Hernandez, Roberto

arXiv.org Artificial IntelligenceNov-25-2025

Recent advancements in Retrieval-Augmented Generation (RAG) have enabled Large Language Models (LLMs) to access multimodal knowledge bases containing both text and visual information such as charts, diagrams, and tables in financial documents. However, existing multimodal RAG systems rely on LLM-based summarization to convert images into text during preprocessing, storing only text representations in vector databases, which causes loss of contextual information and visual details critical for downstream retrieval and question answering. To address this limitation, we present a comprehensive comparative analysis of two retrieval approaches for multimodal RAG systems, including text-based chunk retrieval (where images are summarized into text before embedding) and direct multimodal embedding retrieval (where images are stored natively in the vector space). We evaluate all three approaches across 6 LLM models and a two multi-modal embedding models on a newly created financial earnings call benchmark comprising 40 question-answer pairs, each paired with 2 documents (1 image and 1 text chunk). Experimental results demonstrate that direct multimodal embedding retrieval significantly outperforms LLM-summary-based approaches, achieving absolute improvements of 13% in mean average precision (mAP@5) and 11% in normalized discounted cumulative gain. These gains correspond to relative improvements of 32% in mAP@5 and 20% in nDCG@5, providing stronger evidence of their practical impact. We additionally find that direct multimodal retrieval produces more accurate and factually consistent answers as measured by LLM-as-a-judge pairwise comparisons. We demonstrate that LLM summarization introduces information loss during preprocessing, whereas direct multimodal embeddings preserve visual context for retrieval and inference.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.16654

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

visual content

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

0060ef47b12160b9198302ebdb144dcf-AuthorFeedback.pdf

2a8e6c09a1fd747e43a74710c79efdd5-Paper-Conference.pdf

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models Junho Kim Hyun Jun Kim Y eon Ju Kim Yong Man Ro

T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition Chen Y eh 1 You-Ming Chang 1 Wei-Chen Chiu 1 Ning Y u

6dcf277ea32ce3288914faf369fe6de0-Paper-Conference.pdf

2f8ee6a3d766b426d2618e555b5aeb39-Paper-Conference.pdf

Paraphrasing Is All You Need for Novel Object Captioning Cheng-Fu Y ang 1 Y ao-Hung Hubert Tsai 2 Wan-Cyuan Fan

0060ef47b12160b9198302ebdb144dcf-AuthorFeedback.pdf

Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries

Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems